-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix KeyError
raised by add_files
when parquet file doe not have column stats
#1354
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to test this?
I think this change is good since there is a fair chance that the fields are not in the map. Keep in mind, when you add Parquet files that don't have any stats, they will be included in every table scan as PyIceberg has no information to decide if the file is relevant for the query.
Thanks @Fokko! Yes that did come to mind, I was also thinking of its possible to create the stats on the fly, but though it might be left as an enhancement. Ok let me try and add some unit tests. |
@Fokko I have added the unit test, hopefully its up to make. |
@binayakd Thanks for adding these tests 👍 Can you run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a few nit comments, thanks for the contribution!
@kevinjqliu, @Fokko, pushed the linting fix, and also updated the test based on the suggestions. Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally LGTM, have a few nit comments
Pushed a change to fix the python 3.9 compatibility and updated the test based on the comment, @kevinjqliu. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last nit comment about test readability :)
@kevinjqliu pushed the test readability fix. Thanks! |
Gentle ping @kevinjqliu, so we can wrap the 0.8.1 release up :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the contribution :)
…olumn stats (apache#1354) * fix KeyError, by switching del to pop * added unit test * update test * fix python 3.9 compatibility, and refactor test * update test
* use the non-deprecated func (#1326) * 0.8.0 post release steps (#1334) * add * fix mkdoc * Drop upper bounds for fsspec and it's implementations (#1341) * Drop upper bounds for fsspec and it's implementations * Run poetry lock * Ignore tables without `table_type` from Glue and Hive * Ignore tables without table_type parameters while loading all iceberg table from Glue and Hive catalog (#1331) * Use TABLE_TYPE --------- Co-authored-by: Wenzhuo Zhao <zhaowenzhuo01@bilibili.com> * Replace reference of `Table.identifier` with `Table.name` (#1346) * fix Table.name * replace Table.identifier with Table.name * add warning filter * Allow leading underscore in column name used in row filter (#1358) * Update parser.py Allow leading underscore in column name used in row filter. * Update test_parser.py * Update test_parser.py * Update test_parser.py * Remove Python 3.13 upper bound restriction (#1355) * Remove Python 3.13 upper bound restriction * Fix missing poetry.lock file * Upgrading numpy on the poetry.lock file from v1.26.0 to v1.26.4 * Improve documentation for "how to release" (#1359) * initial update * edits * add gpg instructions * verify artifacts * add twine not * grammar * edits * remove old artifacts * update doc workflow action * and name * add docs on patch vs major/minor release * fix `KeyError` raised by `add_files` when parquet file doe not have column stats (#1354) * fix KeyError, by switching del to pop * added unit test * update test * fix python 3.9 compatibility, and refactor test * update test * bump to 0.8.1 * Add instruction for patch release (#1373) * add instruction for patch release * create branch from tag * Write `null` when there is no parent-snapshot-id (#1383) --------- Co-authored-by: Sumanth <33193748+sumanth-manchala@users.noreply.github.com> Co-authored-by: gitzwz <72312233+gitzwz@users.noreply.github.com> Co-authored-by: Wenzhuo Zhao <zhaowenzhuo01@bilibili.com> Co-authored-by: vincenzon <mvcalder@gmail.com> Co-authored-by: Luca Bigon <luca.bigon@bauplanlabs.com> Co-authored-by: Binayak Dasgupta <binayakd86@gmail.com> Co-authored-by: Fokko Driesprong <fokko@apache.org>
…olumn stats (apache#1354) * fix KeyError, by switching del to pop * added unit test * update test * fix python 3.9 compatibility, and refactor test * update test
…olumn stats (#1354) * fix KeyError, by switching del to pop * added unit test * update test * fix python 3.9 compatibility, and refactor test * update test
…olumn stats (apache#1354) * fix KeyError, by switching del to pop * added unit test * update test * fix python 3.9 compatibility, and refactor test * update test
…olumn stats (apache#1354) * fix KeyError, by switching del to pop * added unit test * update test * fix python 3.9 compatibility, and refactor test * update test
…olumn stats (apache#1354) * fix KeyError, by switching del to pop * added unit test * update test * fix python 3.9 compatibility, and refactor test * update test
Resolves #1353, by switching
del
withpop
to preventKeyError
.